June 5th, 2021

Let's Predict Wine Quality

Disclaimer: I am NOT a wine expert!

How can we develop a wine quality prediction program?

Programing vs. Machine Learning

Machine Learning (supervised)

Machine Learning (supervised)

Machine Learning (supervised)

Step 1: Find data

Wine Dataset

  • 6500 red and white Portuguese "Vinho Verde" wines
  • Features: Physicochemical properties
  • Quality assessed by blind tasting, from 0 (very bad) to 10 (excellent)

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Wine Quality Distribution

Step 2: Apply Machine Learning

Linear Regression

Decision Tree

Random Forest

Comparing Model Performance

Model performance was measured as mean absolute error using 5-fold cross-validation.

Model Mean Absolute Error
Random Forest 0.44
Linear Regression 0.57
Decision Tree 0.60

\[MAE(y,f(X)) = abs(y - f(X))\]

Prediction vs. Actual Quality

Step 3: Profit

We want to know:

  • Which wine properties are the most predictive for quality?
  • How does a property affect the predicted wine quality?
  • Can we extract a "Rule of Thumb" from the black box?
  • Why did a wine get a certain prediction?
  • How do we have to change a wine to achieve a different prediction?

Looking inside the black box

Which features are important?

Permutation Feature Importance

  1. Measure model error on test data (e.g. MAE)
  2. Permute feature of interest
  3. Measure model error again
  4. Compute difference or ratio

Permutation Feature Importance

How do features affect predictions?

ICE and PDP

ICE: Individual Conditional Expectation curves

PDP: Partial Dependence Plots

  1. Choose a feature
  2. Define a grid of values along that feature
  3. For each value of the grid
    1. Replace value for all data points the feature value with the grid value
    2. Get the model predictions
  4. Draw a line through the grid predictions -> ICE
  5. Average the lines -> PDP

ICE and PDP

\[PDP(x) = \frac{1}{n} \sum_{i=1}^n f(x, x_C^{(i)})\]

Effect of Alcohol

Effect of Volatile Acidity

How do features affect predictions?

Interactions

Interactions (H-Statistic)

H-Statistic tells us how much of the effect of two is due to interactions: \(H^2_{jk}= \frac{\sum_{i=1}^n\left[PD_{jk}(x^{(i)})-PD_j(x^{(i)})-PD_k(x^{(i)})\right]^2}{\sum_{i=1}^n{PD}^2_{jk}(x^{(i)})}\)

Rule of thumb for wine quality?

Surrogate Model

Surrogate Model

Tree explains 53.25% of black box prediction variance.

Explain individual predictions

Shapley Value

Explain best wine

Explain worst wine

How Can the Worst Wine be Improved?

Counterfactual Explanations

Counterfactual Explanations

Counterfactual Explanations

Improve worst wine?

How do we get the wine above predicted quality of 5?

  • Decreasing volatile acidity to 0.2 yields predicted quality of 5.25
  • Decreasing volatile acidity to 1.0 and increasing alcohol to 13% yields predicted quality of 5.30

Why interpretability?

Interested in learning more?

Units in Wine dataset

  • fixed acidity g(tartaric acid)/dm3
  • volatile acidity: g(acetric acid/dm3)
  • citric acid: g/dm3
  • residual sugar: g/dm3
  • chlorides: g(sodium chloride)/dm3
  • free sulfur dioxide: mg/dm3
  • total sulfur dioxide: mg/dm3
  • density> g/cm3
  • pH
  • sulphates: g(postassium sulphate) / dm3
  • alcohol vol.%